E. coli do not count single molecules

Organisms must perform sensory-motor behaviors to survive. What bounds or constraints limit behavioral performance? Previously, we found that the gradient-climbing speed of a chemotaxing Escherichia coli is near a bound set by the limited information they acquire from their chemical environments (1). Here we ask what limits their sensory accuracy. Past theoretical analyses have shown that the stochasticity of single molecule arrivals sets a fundamental limit on the precision of chemical sensing (2). Although it has been argued that bacteria approach this limit, direct evidence is lacking. Here, using information theory and quantitative experiments, we find that E. coli’s chemosensing is not limited by the physics of particle counting. First, we derive the physical limit on the behaviorally-relevant information that any sensor can get about a changing chemical concentration, assuming that every molecule arriving at the sensor is recorded. Then, we derive and measure how much information E. coli’s signaling pathway encodes during chemotaxis. We find that E. coli encode two orders of magnitude less information than an ideal sensor limited only by shot noise in particle arrivals. These results strongly suggest that constraints other than particle arrival noise limit E. coli’s sensory fidelity.

throughout are standard error of the mean (see Methods in the main text).B) Correlation time of upgradient velocity,   , which sets the signal correlation time.Parameters in (A) and (B) are those of the median phenotype in Fig. S2, with tumble bias  ≈ 0.09.C) Variance of the total noise in kinase activity,   2 (black), and the estimated variance of particle arrival noise filtered through the kinase response kernel (blue) with  1 = 1/60  (1,2).D) Kinase noise correlation time,   , and kinase response adaptation time,  2 (blue).E) Gain of kinase response to signal or log-concentration, .D) Gain of kinase response to absolute concentration,   =     = / 0 , where   is the gain of kinase responses to particle arrival rate.

Fig. S2: Swimming parameters as a function of tumble bias in different background concentrations. A)
Distribution of tumble bias,  = 1 −   , or fraction of time cells spend in the tumble state, among cells in an isogenic population.Throughout: red is  0 = 0.1 μM, green is  0 = 1 μM, and blue is  0 = 10 μM.Shading is standard error of the mean (Methods).B) Variance of up-gradient velocity,   2 , versus tumble bias, .C) Velocity decorrelation rate,   =   −1 ≈ (1 − )  0 + 2   , versus . quantifies how correlated heading is before and after a tumble;  0 is the average tumble rate; and   is the rotational diffusion coefficient (3).D) Velocity correlation time,   .Fig. S3: Noise power spectra.In frequency space, kinase responses to particle arrivals implies that the noise in kinase activity must be larger than filtered particle arrival noise (blue, using  1 = 1/60  from biochemistry studies Refs.(1,2)).At low frequencies where we can measure noise and responses with our FRET system (green), this bound is far from saturated.Naively extrapolating to higher frequencies (red shaded region, marked by the value of 1/ 1 measured in FRET experiments) violates this bound (the green line goes below the blue line).This implies either additional noise at high frequencies that is not captured by a single exponential (black line is slow noise, green, plus filtered particle noise, blue) or a slower kinase response time  1 (red line is filtered particle noise with  1 ≈ 0.35  measured in FRET experiments), which could be a necessary by product of the coupling between kinases that creates large gain (thus raising the red line) but also slows down the response.The behaviorally-relevant information rates computed in the main text are relatively insensitive to these choices.

Background: Drift speed and information rate
We recently demonstrated that a cell's drift speed   is set by the transfer entropy rate,  ̇→ * , from current signal () = 1  0   to (the trajectory of) swimming behavior () (3).The transfer entropy rate from current signal to swimming behavior is defined as: Here, curly brackets denote the entire past of a variable, up to and including time .Angled brackets indicate an averaged over the joint distribution of (), past (), and ( + ).This quantifies how strongly the swimming transition probabilities depend on the current signal.
The transfer entropy rate from current signal determines the cell's drift speed (3): where  0 is the cell's swimming speed,  0 is the cell's average tumble rate,  is the persistence of the cell's orientation upon tumbling,   is the rotational diffusion coefficient, and   is the fraction of time the cell spends in the run state.
This transfer entropy obeys a series of data processing inequalities (4,5) because of the feed-forward relationship between molecule arrival rate (), kinase activity (), and swimming behavior (): Thus, information about current signal available in particle arrivals sets a fundamental upper limit on a cell's gradient climbing speed.Because these information rates set the cell's chemotaxis performance, defined as   / 0 , these transfer entropy rates quantify behaviorally-relevant information.

Equivalence between transfer entropy and predictive information rates
Here we demonstrate that the transfer entropy rates above are equivalent to a predictive information rate, under some assumptions that are satisfied by bacterial chemotaxis.This relationship is useful because it allows us to derive expressions for the behaviorally-relevant information rates above.
Below, we will write transfer entropy rate from a signal () to a stochastic process (), such as (), (), or ().Starting from the definition above: conditional mutual information can always be written as a difference between two unconditioned mutual information terms: This can be written as Changing variables from  to , where  =  + , we get: Next, we use time stationarity to shift time  by −: Finally, we can change variables to  → −, giving: This last step would not be allowed if the mutual information inside the time derivative was the entire past of , i.e. {( + )}.
Inside the time derivative above is the "predictive information" (6)(7)(8) between the entire past of the stochastic process () up to time  and the signal () at some time  into the future (if  > 0).The time derivative of this mutual information or predictive information is a monotonically decreasing function of : the value of the signal  at a time further in the future (larger ) becomes less correlated with past observations and thus harder to predict.

Derivation of the information in particle arrivals
In this section we derive the information rate from current signal () to past particle counts , which sets a fundamental upper limit on the information rate achievable by a cell.This information rate is given by the following transfer entropy rate: Here, () =   log() is the relative rate of change of ligand concentration along the cell's trajectory, and () is the number of ligand molecules per time that arrive at the cell's surface.
The key quantity we need to derive is the mutual information inside of the derivative: In general, it is difficult to derive the conditional distributions above.However, we can make a few simplifying assumptions.First, although the distribution of particle arrival rate ({()} | ( + )) has Poisson statistics, if a sufficient number of particles arrive at the cell's receptor array per unit time, the Poisson statistics are approximately Gaussian.This approximation is accurate when the cell sees much more than 1 particle per run on average.
We focus on computing a Gaussian approximation of (( + )|{()}).For this, we only need to compute the mean  | () and variance  | 2 ().The mutual information can then be computed from: and the information rate is: Here, Here,   2 ≈  0 2 3   is the variance of the cell's up-gradient velocity,  0 is its swimming speed, and   is the fraction of time it spends in the run state; and   is the correlation time of the cell's velocity and the signal,   −1 = (1 − )  0 + 2   , where  0 is the cell's baseline tumble rate,  is the directional persistence, and   is the rotational diffusion coefficient.
The optimal kernel can be expressed in Fourier space terms of the power spectra of the signal () and the particle arrival rate () as (Appendix A): () is the Fourier transform of   (), with the Fourier transform defined as and inverse transform defined as .   () is the causal part of the spectral decomposition of   () (defined below and in Appendix A), where   () is the power spectrum of .  * () is its (anti-causal) complex conjugate.  () is the cross-spectra of  and , equivalent to the Fourier transform of   (), where   () = ⟨(() −  0 )( + )⟩ is the crosscorrelation of  and  in the time domain.Finally, [()] + indicates the causal part of the inverse Fourier transform of (), which can be found by taking the inverse Fourier transform of (), multiplying the result by a Heaviside step function in the time domain, and then taking the Fourier transform.
To derive these various quantities, we take the Fourier transforms of Eqns.16 and 17, and then solve for the Fourier transforms of our variables () and (): () =   () + √ 0 ().
Here we have introduced a small, dimensionless parameter  ≪ 1 that we will take to zero later.Physically, this is as if the cell experiences a weak restoring force back to regions where concentration () =  0 .Without it, the correlation function of (), which is proportional to the cell's mean squared displacement, would diverge at long times.Everything else remains bounded and well-defined as  goes to zero.
As explained in Appendix A, to find the optimal causal kernel, we need to decompose   () into the product of a causal and an anti-causal part.This requires finding the zeros and poles of   ().The zeros satisfy   ( =    ) = 0, and therefore are the complex solutions to the equation: or, defining   = 2  0  2   2   3 : is a dimensionless signal-to-noise ratio parameter, where the signal is  0 2  2   2   3 (the prefactor of the first term in   () when  is rescaled by 1/  ) and the noise is  0 (the second term in   ()).
The zeros of   () are: as well as their complex conjugates,  1 * and  2 * .As  → 0, these will simplify to: Note that there are several equivalent forms for these zeros, and they change from being fully imaginary to complex when   > 1/4.
The poles of   () satisfy Power spectral densities of real, stable, causal systems can generally be decomposed into causal and anticausal parts ("Wiener-Hopf factorization") (17)(18)(19): where has zeros and poles with negative imaginary parts, and   * () is its complex conjugate.
Next, we need the causal part of (see Appendix A): One approach would be to compute the inverse Fourier transform of   ()   * () , apply the time shift forward by  implied by  −   , multiply the result by a Heaviside step function Θ(), and compute the Fourier transform of the result.An alternative approach is to compute the partial fraction decomposition of the expression above and keep only the terms with poles and zeros that have negative imaginary part: for unknown , , and .Only the pole of the first term ( = − 1 ) has negative imaginary part, so we only need to compute  to get the causal part of this expression.With some algebra, this is: and the causal part of   ()   * ()  −   is then: Finally, the optimal kernel that computes the mean of (( + )|{()}),  | (), is (Appendix A): after taking  to zero.We convert this kernel to the time domain and discuss its properties in the next section.
The variance of (( + )|{()}),  | 2 (), is (Appendix A, Eqn.127): where we used   2 =  2   2 = /(2  0   3 ).Then the correlation coefficient   2 () is: Finally, using Eqn.14 from above, we find that the behaviorally-relevant information available in particle counts is: Expanding around small SNR  gives: Note that for small signals, Eqn.44 can be written The optimal kernel in the time domain and the information rate remain real when   > 1/4, even though  ,1 and  ,2 become complex.In this regime, they can be written: The optimal kernel in frequency space can be written: the correlation coefficient at  = 0 can be written: and the information rate is: For small   , this reduces again to Eqn. 45.

Optimal kernel for estimating signal from particle arrivals
To get the time-domain kernel mapping past particle arrival rate () to signal ( + ),   (),we take the inverse Fourier transform of   (), defined as .   () has the form of a sum of two exponentials, with real exponents when   ≤ 1/4 and complex ones when   > 1/4.For   < 1/4, the kernel in the time domain is: where Θ() is the Heaviside step function, indicating that the kernel is indeed causal.
The optimal kernel   () essentially computes the time derivative of concentration, while also averaging out shot noise from particle arrivals.It has several notable features.First, it is biphasic and exhibits perfect adaptation, a hallmark of the chemotaxis pathway.Any derivative operation should adapt perfectly because it should only respond to changes in the input.It is interesting to examine how the time scales of the optimal kernel are set by the signal-to-noise ratio   = 2  0  2   2   3 .The initial response time scale is set by  ,1 −1 and its adaptation time scale is set by  ,2 −1 .
When the inputs are very noisy, i.e. as   → 0,  ,1 −1 gets longer but saturates at   : This makes sense because it maximally averages out shot noise, but only for as long as past signals are correlated with the current signal.As the SNR increases, this initial averaging time gets shorter.
As the inputs get noisier, i.e. as   → 0, the adaptation time approaches: This shows that the adaptation time can become long compared to   when   < 1/4.
Interestingly, in this regime, the kernel   () has the same functional form as the phenomenological kernel we measured previously (3) (after transforming the input quantity from () to ()).
As the SNR   increases, the initial response time and the adaptation time both get shorter.Although the kernel oscillates, its decay rate is faster than the period of oscillations.The time scales of decay and oscillation are closest to each other, and thus the oscillation amplitude is largest, when   is large: in the limit that   → ∞, Re[ ,2 ] = Im[ ,2 ] =   1/4 .Even in this limit, the peak of the kernel following the first negative lobe occurs at time  = 3   −1/4 and is smaller than the kernel's maximum value (  ( = 0)) by a factor of  −3/2 ~ 0.009.Thus, the oscillations are small.  () transitions continuously between the forms above as   varies.
The results of this and previous section could also be derived using the continuous-time Kalman-Bucy filter (20,21).That approach provides a pair of ODEs for the estimator of  (i.e.conditional mean  | ) and its uncertainty (i.e. the conditional variance  | 2 ) that are driven by the observations, ().Once  | 2 reaches steady state in that formulation (consistent with our assumption of stationarity here), the ODE for  | can be solved in terms of a kernel convolved with past (), which is identical to the optimal kernel above.

Modeling kinase activity
In shallow gradients, CheA kinases respond approximately linearly to recent signals.We model kinase responses, (), to past particle arrival rates, (), in background particle arrival rate  0 , as: The response function to particle arrival rate,   (), is: We note that in our previous work (3) we modeled kinase responses to signals, (), directly.In the Methods section of the main text, we show how to convert between these representations (Equation 15in the Methods).
The Fourier transform of this kernel is: Particle arrival noise filtered through this kernel has spectrum: In experiments, we measure responses to absolute changes in concentration (), with response kernel   (), which has the same form as   () above, but with gain   .Then, we convert   to   via   =   /  , and thus convert   () to   ().With this, the intensity of filtered particle noise in Eqn.61 is proportional to   2  0 =   2  0 /  .This conversion implies that E. coli respond to every particle arriving at their surface, which is unlikely.Instead, one might use an effective    <   to do the conversion above, which would increase our estimate for the intensity of filtered particle noise, being proportional to   2  0 /   .However, modeling the filtered particle noise with    =   maximizes our estimate of E. coli's information rate.Since we find that E. coli are far from the physical limit, this is a conservative modeling choice.
Next, we consider modeling noise in kinase activity.As explained in Fig. S3 above, the FRET system we use for measuring kinase activity has limited time resolution, about 0.3 .This allows us to constrain slow fluctuations in kinase activity, whose correlation function is characterized by a single decaying exponential function (3,22): The parameters here are the long-time variance   2 and the correlation time   , which are related to the diffusivity of the noise by   =   2 /  .The power spectrum of this noise is There can also be noise at higher frequencies that we don't observe.Kinase responses to particle arrival noise set a minimum noise level at all frequencies.At high frequencies, simply extrapolating the power spectrum in Eqn.63 drops below the implied filtered particle noise in Eqn.61 if we take that  1 in Eqn.59 to be the value measured previously in biochemical studies (1,2),  1 ≈ 1/60 .One possibility is that cooperativity of the receptor-kinase lattice slows down  1 to a value closer to what we measure in FRET,  1 ≈ 0.35 .In this case, extrapolating the slow noise to high frequencies does not cause any problems.
To avoid having unphysical noise power at high frequencies, we take the total noise in kinase activity to be a sum of the measured slow noise in Eqn.63 plus the filtered particle arrival noise in Eqn.61.There are likely other noise sources at high frequencies, so this modeling choice maximizes our estimate of E. coli's information rate.Since we find that E. coli are far from the physical limit, this is a conservative modeling choice.Ultimately, even if we only model noise in kinase activity as being the slow, measurable noise, the effects on the numerical values of the information rate are small.Before continuing, we will make an additional simplifying assumption.The adaptation time of kinase responses,  2 , and the correlation time of kinase noise,   , are each roughly ~10 .Therefore, below we will also assume  2 ≈   , which also has small quantitative effects on the results.These simplifications also allow us to derive interpretable analytical expressions.

Derivation of the behaviorally-relevant information rate in kinase activity
In this section, we derive the information about current signal encoded in the kinase activity of a typical E. coli cell.Here, we seek an expression for the following transfer entropy rate: Again, the calculation centers on calculating the mutual information between past kinase activity  and signal at some time  into the future, ({()}; ( + )).The quantity we need to derive this is the posterior distribution of signal given past kinase activity, (( + )|{()}).Past measurements by us and others (3,22,23) have shown that kinase activity in wild type cells (i.e. cells with all receptor types and with their adaptation system intact) is well-approximated by a Gaussian process.Because of this, and because we consider shallow gradients, we only need the variance of (( + )|{()}) to compute the mutual information to leading order in  (see the section Derivation of the behaviorally-relevant information rate in particle arrivals, above).Thus, we can approximate  and  as jointly Gaussian distributed.
With the approximation that  and  are also jointly Gaussian distributed, (( + )|{()}) is Gaussian, and therefore we again need to compute a mean  | () and a variance  | 2 ().Then, the mutual information can then be computed from: and the predictive information rate is is the generalized correlation between ( + ) and past , or the fraction reduction of variance in ( + ) upon observing past .
To compute the rate of information transfer from current signal () to kinase activity (), we need the conditional mean and variance of ( + ),  | () and  | 2 ().These in turn require deriving the kernel   () that maps past kinase activity  to the conditional mean,  | ().This can again be derived using Wiener filtering theory and expressed in terms of the power spectra of  and .These are: where   () = ⟨() ( + )⟩,   () = ⟨(() −  0 ) (( + ) −  0 )⟩, and   () = ⟨(() −  0 ) ( + )⟩.The first term in   () comes from responses to signals, the second term comes from filtered particle arrival noise, and the third term comes from internal kinase noise.For convenience, we will define We now need to decompose   () into the product of a causal and an anti-causal part by finding its zeros and poles.The zeros satisfy   ( =    ) = 0 are complex solutions to the equation: This can be written in terms of the particle arrival SNR,   = 2  0  2   2   3 , and the ratio of the diffusivity of filtered particle noise and the diffusivity of slow kinase noise,  = The zeros of   () are: as well as their complex conjugates.
The poles of   () satisfy , as well as their complex conjugates.
We decompose   () as: where ) has negative imaginary part, so we only need to compute  to get the causal part of this expression.This is: and the causal part of   ()/  * () is then: Finally, like   () in the section above,   () ∝ exp (− ) when  ≥ 0.
With these expressions, the optimal kernel that computes the mean of (( + )|{}) is (Appendix A): We discuss this kernel in the following section.
Eqns. 44, 52, and 89 for the information rates  ̇→ * and  ̇→ * are nearly exact, but make several assumptions.They require  0   ≫ 1 so that we can approximate particle arrivals as Gaussian.They also use Gaussian approximations for the mutual information quantities ((); {}) and ((); {}), which are valid when these quantities are small (shallow gradients, small ).We used linear theory to model kinase responses, which is valid if deviations in kinase activity from baseline are small-i.e. when  is small.And we ignored feedbacks in which responses to signals change the signal statistics that the cell experiences, again valid when  is small.Each of these assumptions can break at a different characteristic value of : for particle arrival rate, small  means   ≪ 1; for kinase activity, small  means   ≪ 1.That all said, Eqns.44, 52, and 89 currently provide our best analytical insight into information transfer during chemotaxis.

Optimal kernel for estimating signal from kinase activity
To understand the kernel   () that constructs an estimate of the current signal, (), from past kinase activity, {}, we first multiply it by the kinase response function of particle arrivals,   ().This gives a composite kernel that effectively maps the past of particle arrivals , corrupted by kinase noise, to an estimate of the signal (): This composite kernel that effectively acts on particle arrivals has the same structure as the optimal kernel   () (Eqn.53) for directly constructing () from particle arrivals.It's biphasic and adapts perfectly, although with different time scales than   ().This means that   () attempts to invert the kinase response function   (), to the extent possible given the kinase noise   (), and then apply something as close as possible to the optimal kernel for particle counts,   ().
Taking this line of thinking further, the optimal kernel acting on particle counts,   (), is the kernel that the cell should try to implement (up to changes of units).However, the cell has to communicate information about the signal () through multiple chemical species in order to send them from the kinases at one location to the motors at various other locations.These steps impose constraints on the cell's signaling pathway, and they add noise.Despite this, the cell should be attempting to make its composite kernel from input (particle counts) to output (tumble rate) look like   ().

Information about current versus past signals encoded in kinase activity
We previously quantified the information about all past signals encoded in kinase activity,  ̇→ , and found that E. coli use this information efficiently: they climb gradients at speeds near the informationperformance limit (3).There are two possible inefficiencies that prevent E. coli from reaching the limit: first, cells might encode information about past signals, which don't contribute to gradient-climbing; and second, information about current signal can be lost in communication to the motor behavior.Now that we have an expression for the information about current signal () in kinase activity, we can distinguish between these two effects.
We defined the information about all past signals encoded in kinase activity using the following transfer entropy rate: The subset of this information that is relevant to chemotaxis is: which is the information we have considered here.How do these information rates compare to each other for the kinase response function and noise correlation function that we measured here and previously?
First, note that if kinase activity  were Markovian in (), then we would have and all information about signals encoded in kinase activity is relevant to gradient climbing.Surprisingly, this means that a long response adaptation time does not necessarily degrade information about the current signal.
We can evaluate both of these information rates for the response and noise models used here.In the regime of shallow gradients and  2 ≈   , the information about past and present signals is (3,25): We compare this to the information about current signal only derived in the previous section, Eqn.91, reproduced below: The ratio of these two information rates has a particularly simple form: Thus, for  ̇→ to mostly carry information about current signal and be close to  ̇→ * , 1) the time scale of initial kinase response must be short compared to the signal correlation time,  1 ≪   ; and 2) the diffusivity of filtered particle noise must be small compared to that of internal kinase noise,   2  0 ≪ 2   .
Using  1 = 1/60  from biochemistry studies Refs.(1,2), we estimate that  ̇→ *  ̇→ ≈ 0.88 ± 0.01 in  0 = 1 μM, and increases as  0 gets large or small.This suggests that E. coli's main source of "inefficiency" is that relevant information in kinase activity is lost in communication with the motors.
This result might appear to be in contradiction with the results of Ref. (16), which found that the fraction of predictive information about signals relative to past information about signals was very small (about 1%) in a model of E. coli's kinase activity, , and downstream readout molecules,  (CheYp).(Our  ̇→ * , being a predictive information rate, is very similar to their predictive information, while  ̇→ is very similar to their past information.)However, that study considered predictive and past information encoded in the current value of the readout molecule, (), instead of the entire history of readout molecules {}.This difference in how our information quantities are defined explains the large difference.
Kinase activity  and even CheY phosphorylation level downstream  are not the final outputs of the chemotaxis system.Instead, downstream pathway dynamics can act on the entire past of  or  to extract more information and make behavioral decisions.Therefore, the current values of () and () do not need to be faithful estimates of the current (or future) signal (); they just need to carry decodable information about () in their trajectories.Our information measures above account for this.
In On the left-hand side, we have the product of a causal function and a function that is nonzero for positive and negative time delays, the result of which is also nonzero for positive and negative time delays.On the right-hand side, we have a function that is nonzero for positive and negative time delays and an anticausal function.How do we get the optimal causal kernel   () out of this?
Naively, one might divide both sides by   () and then multiply element-wise by a Heaviside step function in the time domain to get a causal kernel   ().However, although the resulting kernel is causal, it does not satisfy the optimality condition.Plugging that kernel back into Eqn.114, it multiplies the non-causal   (), and the result is non-causal.Thus, () is non-causal, so that kernel does not satisfy the optimality condition, ( ′ ) = 0 for  ′ ≥ 0.
Instead, we need to split   () into causal and anti-causal parts, called a spectral factorization or Wiener-Hopf factorization (17)(18)(19): where () is a causal function in the time domain and its complex conjugate  * () is anti-causal.() is constructed by putting all poles and zeros of   () with negative real part into () and those with positive real part into  * ().

Fig. S4 :
Fig. S4:Optimal kernel for inferring current signal, (), from past particle arrivals, .Colors indicate different values of the signal-to-noise ratio   , marked on the right.Each kernel is normalized so that   (0) = 1.